Abstract:We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modalities enhance each other. This includes a series of techniques such as joint text-vision pre-training, zero-vision SFT, and joint text-vision reinforcement learning. Building on this multimodal foundation, K2.5 introduces Agent Swarm, a self-directed parallel agent orchestration framework that dynamically decomposes complex tasks into heterogeneous sub-problems and executes them concurrently. Extensive evaluations show that Kimi K2.5 achieves state-of-the-art results across various domains including coding, vision, reasoning, and agentic tasks. Agent Swarm also reduces latency by up to $4.5\times$ over single-agent baselines. We release the post-trained Kimi K2.5 model checkpoint to facilitate future research and real-world applications of agentic intelligence.




Abstract:Neural-symbolic computing aims at integrating robust neural learning and sound symbolic reasoning into a single framework, so as to leverage the complementary strengths of both of these, seemingly unrelated (maybe even contradictory) AI paradigms. The central challenge in neural-symbolic computing is to unify the formulation of neural learning and symbolic reasoning into a single framework with common semantics, that is, to seek a joint representation between a neural model and a logical theory that can support the basic grounding learned by the neural model and also stick to the semantics of the logical theory. In this paper, we propose differentiable fuzzy $\mathcal{ALC}$ (DF-$\mathcal{ALC}$) for this role, as a neural-symbolic representation language with the desired semantics. DF-$\mathcal{ALC}$ unifies the description logic $\mathcal{ALC}$ and neural models for symbol grounding; in particular, it infuses an $\mathcal{ALC}$ knowledge base into neural models through differentiable concept and role embeddings. We define a hierarchical loss to the constraint that the grounding learned by neural models must be semantically consistent with $\mathcal{ALC}$ knowledge bases. And we find that capturing the semantics in grounding solely by maximizing satisfiability cannot revise grounding rationally. We further define a rule-based loss for DF adapting to symbol grounding problems. The experiment results show that DF-$\mathcal{ALC}$ with rule-based loss can improve the performance of image object detectors in an unsupervised learning way, even in low-resource situations.